{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# FM(Factorized Machine)因子分解机\n",
    "\n",
    "该方法常用的场景是推荐，在snippets-rec中会有FM及其衍生算法的介绍。该算法不仅仅是一个回归算法, 也可以用于分类、排序等等。"
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 1 简介\n",
    "\n",
    "### 1.1 FM视为线性模型引入特征交叉\n",
    "- **特征交叉在处理多因素交互作用时是十分必要的.**\n",
    "如预测购买商品的任务中, 年龄=7岁 + 女孩 = 童装连衣裙。\n",
    "\n",
    "- 传统线性回归引入交叉项: 二阶多项式回归\n",
    "$$\n",
    "\\hat{y}(x)=w_0 + \\underbrace{\\sum_{i=1}^p w_i x_i}_{\\text {线性回归 }}+\\underbrace{\\sum_{i=1}^p \\sum_{j=i+1}^p w_{i j} x_i x_j}_{\\text {交叉项 (组合待征) }}\n",
    "$$\n",
    "\n",
    "其中样本$x_i$为样本x的第$i$维特征，$p$为维数.\n",
    "\n",
    "当$X$中包含稀疏特征(如商品类别/商品标签, onehot encode后可能几万个维度)时, 有以下致命弱点:\n",
    "1)交叉项会有维度爆炸;\n",
    "2)参数数目极大, 过拟合;\n",
    "3)当特征有稀疏性时, 交叉项也会有稀疏性而且往往更大, 这会交叉因素很难学习。\n",
    "例如推荐场景下，商品作为特征是非常稀疏的，这会导致同时买过蓝牙耳机和雪地靴的人极少甚至没有. 而实际两个商品的交叉项可能是有意义的(蓝牙耳机搭配户外运动装备)\n",
    "\n",
    "- FM引入了隐含特征, 通过矩阵分解对交叉项进行近似, 解决特征稀疏问题\n",
    "因此FM通过矩阵分解对矩阵$W$进行了低秩近似.$W \\approx V \\cdot V^t$，其中$V \\in \\mathbb{R}^{p \\times k}$.\n",
    "1）稀疏特征场景下, $p$通常比较大, $k$往往小于$p$. FM能有效降低参数数目，防止模型泛化能力不足；\n",
    "\n",
    "2）极端情况, 当k足够大时，FM模型等价于线性回归中引入交叉项。\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "交叉相部分= \\\\\n",
    "& \\sum_{i=1}^{p} \\sum_{j=i+1}^p (\\vec {\\mathbf{v}_i} \\cdot \\vec {\\mathbf{v}_j} ) x_i x_j \\\\\n",
    "= & \\frac{1}{2} \\sum_{i=1}^p \\sum_{j=1}^p (\\vec {\\mathbf{v}_i} \\cdot \\vec {\\mathbf{v}_j} ) x_i x_j-\\frac{1}{2} \\sum_{i=1}^p (\\vec {\\mathbf{v}_i} \\cdot \\vec {\\mathbf{v}_j} ) x_i x_i \\\\\n",
    "= & \\frac{1}{2}\\left(\\sum_{i=1}^p \\sum_{j=1}^p \\sum_{f=1}^k v_{i, f} v_{j, f} x_i x_j-\\sum_{i=1}^p \\sum_{f=1}^k v_{i, f} v_{i, f} x_i x_i\\right) \\\\\n",
    "= & \\frac{1}{2} \\sum_{f=1}^k\\left(\\left(\\sum_{i=1}^{p} v_{i, f} x_i\\right)\\left(\\sum_{j=1}^{p} v_{j, f} x_j\\right)-\\sum_{i=1}^{p} v_{i, f}^2 x_i^{2}\\right) \\\\\n",
    "= & \\frac{1}{2} \\sum_{f=1}^k\\left(\\left ( \\sum_{i=1}^{p} v_{i, f} x_i \\right)^{2}-\\sum_{i=1}^{p} v_{i, f}^2 x_i^{2}\\right)\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "FM解决了暴力引进交叉项在稀疏特征场景下的缺点:\n",
    "1)参数数目降低, 通常$k$都远小于$p$, 增加模型泛化能力;\n",
    "2)通过矩阵分解来低秩近似交叉项稀疏, 保留了模型的表达能力;\n",
    "3)FM还具备二阶多项式回归不具备的能力, 在交叉项极为稀疏甚至全为0的情况下学习交叉项。这其实是低秩近似的功劳。\n",
    "A和i1完全没有相关数据, 在矩阵W上则表现为稀疏. 但基于低秩矩阵乘积表示则会收到其他非稀疏的交叉项影响，例如:\n",
    "B、C等许多和i1有交叉项，同时有和i6有交叉项，因此FM求解结果中，v_i1和v_i6应该是非常相似，而A和i6又有交叉项, 因此A和i1的交叉项也不为0, 交叉项权重接近于A和i6.\n",
    "4)FM在计算方程时的时间复杂度O(kp), 而不是二阶多项式的O(kp^2)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 1.2 FM与因式分解\n",
    "FM是对因式分解的扩展, 其表达能力比隐式分解更高.\n",
    "- FM:\n",
    "每一个特征都会有其隐向量, 特征交叉时, 交叉特征的权重为两特征隐向量的内积.\n",
    "\n",
    "- 因式分解:\n",
    "因式分解分解了两组变量的相关关系. 以推荐场景为例, 对用户~物品矩阵进行分解, 可以得到两组隐向量, 分别代表物品和用户.\n",
    "\n",
    "- 因式分解可以看做FM的特殊情况\n",
    "用户U和物品I都以onehot形式作为特征, 则FM可以得到一组隐向量, 包含了用户onehot特征对应的隐向量和物品onehot特征对应的隐向量, 此时能够得到等价于因式分解的表达效果.\n",
    "$$\n",
    "y = w_0 + w_u + w_i + \\mathbb{v_u} \\cdot \\mathbb{v_i}\n",
    "$$\n",
    "\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "下面我们将上述表达式以样本形式矩阵化，方便机器学习和深度学习引擎求解，通过下面几步阐述：\n",
    "step 1)假定共有$n$个样本，即$X \\in \\mathbb{R}^{n \\times p}$，$Y \\in \\mathbb{R}^{n}$\n",
    "$$\n",
    "X = \\begin{bmatrix}\n",
    "x_1^{(1)} & \\dots & x_p^{(1)}\\\\\n",
    " \\vdots \\ & \\ddots \\ & \\vdots \\\\\n",
    "x_1^{(n)} & \\dots & x_p^{(n)} \\\\\n",
    "\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "step 2)$V \\in \\mathbb{R}^{p \\times k}$\n",
    "$$\n",
    "V = \\begin{bmatrix}\n",
    "v_1^{(1)} & \\dots & v_k^{(1)}\\\\\n",
    " \\vdots \\ & \\ddots \\ & \\vdots \\\\\n",
    "v_1^{(p)} & \\dots & v_k^{(p)} \\\\\n",
    "\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "step 3)Y的交叉项$Y_{inter}$\n",
    "\n",
    "$$\n",
    "Y_{inter} = \\sum_{i=1}^{p} \\sum_{j=i+1}^{p} \\langle \\textbf v_i, \\textbf v_j \\rangle \\vec{x_i} \\odot \\vec{x_j}  \\\\\n",
    "= \\frac{1}{2} \\sum_{f=1}^{k} \\Big( \\big(\\sum_{i=1}^{p} v_f^{(i)} \\vec{x_i} \\big)^2 - \\sum_{i=1}^{p}v_f^{(i) 2} \\vec{x_i}^2 \\Big) \\\\\n",
    "= \\frac{1}{2} \\sum_{f=1}^{k}  \\big(\\sum_{i=1}^{p} v_f^{(i)} \\vec{x_i} \\big)^2 -\n",
    "\\frac{1}{2} \\sum_{f=1}^{k}  \\big( \\sum_{i=1}^{p}v_f^{(i) 2} \\vec{x_i}^2 \\big)\n",
    "$$\n",
    "\n",
    "注意：\n",
    "- 此处$\\vec{x_i} $表示向量，$\\vec{x_i} \\in \\mathbb{R}^{n \\times 1}$\n",
    "- $\\odot$表示元素乘法(element wise product)，此处平方运算$\\vec{x_i}^{2}$也表示向量的元素级平方操作.\n",
    "- 记住该公式相减的一部分和第二部分, 后续会用到.\n",
    "\n",
    "step4)Y的交叉项可以利用$XV$进一步简化:\n",
    "$X$和$V$的内积($n*k$维):\n",
    "$$\n",
    "XV = \\begin{bmatrix}\n",
    "\\sum_{i=1}^{p} v_f^{(1)} x_i^{(1)}  & \\dots &  \\sum_{i=1}^{p} v_f^{(k)} x_i^{(1)}\\\\\n",
    " \\vdots \\ & \\ddots \\ & \\vdots \\\\\n",
    "\\sum_{i=1}^{p} v_f^{(1)} x_i^{(n)} & \\dots & \\sum_{i=1}^{(n)} v_f^{(k)} x_i^{(n)} \\\\\n",
    "\\end{bmatrix} =\n",
    "\\begin{bmatrix}\n",
    "S_{1,1}^{(1)}  & \\dots &  S_{1,k}^{(1)}\\\\\n",
    " \\vdots \\ & \\ddots \\ & \\vdots \\\\\n",
    "S_{1,1}^{(n)}  & \\dots & S_{1,k}^{(n)} \\\\\n",
    "\\end{bmatrix}\n",
    "$$\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "我们注意到:\n",
    "- step3中$Y$求解第一项中求和元素$\\sum_{i=1}^{p} v_f^{(i)}$和 $ \\vec{x_i} $ 的乘积恰好等于$XV$的第$i$列:\n",
    "$$\n",
    "v_f^{(i)} \\vec{x_i} = \\begin{pmatrix}\n",
    "  v_f^{(i)} x_i^{(1)}  \\\\\n",
    "  ... \\\\\n",
    "  v_f^{(i)} x_i^{(n)}\n",
    "\\end{pmatrix}\n",
    "$$\n",
    "\n",
    "- 因此step3中第一项$\\frac{1}{2} \\sum_{f=1}^{k}  \\big(\\sum_{i=1}^{p} v_f^{(i)} \\vec{x_i} \\big)^2$等价于$XV$的元素级平方$(XV)^2 = (XV) \\odot (XV)$.\n",
    "我们定义一个运算表示对矩阵的列求和: $\\sigma_{(1)}(B)$表示对矩阵$B$的列向量求和. 因此有:\n",
    "$$\n",
    "\\frac{1}{2} \\sum_{f=1}^{k}  \\big(\\sum_{i=1}^{p} v_f^{(i)} \\vec{x_i} \\big)^2\n",
    "=\\frac{1}{2} \\sigma_{(1)}((XV) \\odot (XV))\n",
    "$$\n",
    "\n",
    "- 同理step3中的第二项中求和元素$ \\sum_{i=1}^{p}v_f^{(i) 2} \\vec{x_i}^2$可以视为$(X \\odot X) \\cdot (V \\odot V)$的第$i$列, 第二项可以视为$(X \\odot X) \\cdot (V \\odot V)$的列向量求和:\n",
    "\n",
    "$$\n",
    "\\frac{1}{2} \\sum_{f=1}^{k}  \\big( \\sum_{i=1}^{p}v_f^{(i) 2} \\vec{x_i}^2 \\big)\n",
    "=\\frac{1}{2} \\sigma_{(1)}((X \\odot X) \\cdot (V \\odot V))\n",
    "$$\n",
    "\n",
    "因此最终的矩阵表达式为:\n",
    "\n",
    "$$\n",
    "Y = w_0 + X w + \\frac{1}{2} \\sigma_{(1)}((XV) \\odot (XV)) - \\frac{1}{2} \\sigma_{(1)}((X \\odot X) \\cdot (V \\odot V))\n",
    "$$\n",
    "其中:\n",
    "- $w_0$为截距项;\n",
    "- $w$为线性项系数;\n",
    "- $V$为交叉项的隐式特征;\n",
    "- $\\odot$为元素级乘法;\n",
    "- $\\sigma_{(1)}(B)$表示对矩阵$B$的列向量求和\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 3 构造数据\n",
    "\n",
    "构造电商点击数据, 拥有用户id/城市/年龄/性别/商品类型/商品价格等特征, 是否点击为关注的目标."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "outputs": [
    {
     "data": {
      "text/plain": "   user_id  item_type city  sex  age  price  click\n0        1         11   成都    0   44    108      1\n1        2         22   成都    1   28    260      0\n2        3         80   北京    1   19    402      1\n3        4         59   北京    1   51    161      1\n4        5         24   成都    1   52    303      0\n5        6          1   重庆    1   46    778      0\n6        7          2   廊坊    1   41    423      0\n7        8          4   重庆    0   47    341      0\n8        9         87   成都    0   53    562      1\n9       10         34   北京    0   28    496      0",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>user_id</th>\n      <th>item_type</th>\n      <th>city</th>\n      <th>sex</th>\n      <th>age</th>\n      <th>price</th>\n      <th>click</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>11</td>\n      <td>成都</td>\n      <td>0</td>\n      <td>44</td>\n      <td>108</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>22</td>\n      <td>成都</td>\n      <td>1</td>\n      <td>28</td>\n      <td>260</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3</td>\n      <td>80</td>\n      <td>北京</td>\n      <td>1</td>\n      <td>19</td>\n      <td>402</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4</td>\n      <td>59</td>\n      <td>北京</td>\n      <td>1</td>\n      <td>51</td>\n      <td>161</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5</td>\n      <td>24</td>\n      <td>成都</td>\n      <td>1</td>\n      <td>52</td>\n      <td>303</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>6</td>\n      <td>1</td>\n      <td>重庆</td>\n      <td>1</td>\n      <td>46</td>\n      <td>778</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>7</td>\n      <td>2</td>\n      <td>廊坊</td>\n      <td>1</td>\n      <td>41</td>\n      <td>423</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>8</td>\n      <td>4</td>\n      <td>重庆</td>\n      <td>0</td>\n      <td>47</td>\n      <td>341</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>9</td>\n      <td>87</td>\n      <td>成都</td>\n      <td>0</td>\n      <td>53</td>\n      <td>562</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>10</td>\n      <td>34</td>\n      <td>北京</td>\n      <td>0</td>\n      <td>28</td>\n      <td>496</td>\n      <td>0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.DataFrame(\n",
    "    {\n",
    "        'user_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n",
    "        'item_type': np.random.randint(0, 100, size=10),\n",
    "        'city': ['成都', '成都', '北京', '北京', '成都', '重庆', '廊坊', '重庆', '成都', '北京'],\n",
    "        'sex': np.random.randint(0, 2, size=10),\n",
    "        'age': np.random.randint(16, 60, size=10),\n",
    "        'price': np.random.randint(100, 1000, size=10),\n",
    "        'click': np.random.randint(0, 2, size=10)\n",
    "    }\n",
    ")\n",
    "data"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "outputs": [
    {
     "data": {
      "text/plain": "   user_id  item_type city  sex  age  price  click  city_code\n0        1         11   成都    0   44    108      1          0\n1        2         22   成都    1   28    260      0          0\n2        3         80   北京    1   19    402      1          1\n3        4         59   北京    1   51    161      1          1\n4        5         24   成都    1   52    303      0          0\n5        6          1   重庆    1   46    778      0          2\n6        7          2   廊坊    1   41    423      0          3\n7        8          4   重庆    0   47    341      0          2\n8        9         87   成都    0   53    562      1          0\n9       10         34   北京    0   28    496      0          1",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>user_id</th>\n      <th>item_type</th>\n      <th>city</th>\n      <th>sex</th>\n      <th>age</th>\n      <th>price</th>\n      <th>click</th>\n      <th>city_code</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>11</td>\n      <td>成都</td>\n      <td>0</td>\n      <td>44</td>\n      <td>108</td>\n      <td>1</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>22</td>\n      <td>成都</td>\n      <td>1</td>\n      <td>28</td>\n      <td>260</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3</td>\n      <td>80</td>\n      <td>北京</td>\n      <td>1</td>\n      <td>19</td>\n      <td>402</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4</td>\n      <td>59</td>\n      <td>北京</td>\n      <td>1</td>\n      <td>51</td>\n      <td>161</td>\n      <td>1</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5</td>\n      <td>24</td>\n      <td>成都</td>\n      <td>1</td>\n      <td>52</td>\n      <td>303</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>6</td>\n      <td>1</td>\n      <td>重庆</td>\n      <td>1</td>\n      <td>46</td>\n      <td>778</td>\n      <td>0</td>\n      <td>2</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>7</td>\n      <td>2</td>\n      <td>廊坊</td>\n      <td>1</td>\n      <td>41</td>\n      <td>423</td>\n      <td>0</td>\n      <td>3</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>8</td>\n      <td>4</td>\n      <td>重庆</td>\n      <td>0</td>\n      <td>47</td>\n      <td>341</td>\n      <td>0</td>\n      <td>2</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>9</td>\n      <td>87</td>\n      <td>成都</td>\n      <td>0</td>\n      <td>53</td>\n      <td>562</td>\n      <td>1</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>10</td>\n      <td>34</td>\n      <td>北京</td>\n      <td>0</td>\n      <td>28</td>\n      <td>496</td>\n      <td>0</td>\n      <td>1</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 城市编码\n",
    "data['city_code'] = pd.factorize(data['city'])[0]\n",
    "data"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 4 基于numpy实现\n",
    "\n",
    "参考: [Factorization Machines with libFM](https://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf)\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.datasets import make_classification\n",
    "from sklearn.preprocessing import StandardScaler\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, random_state=42)\n",
    "scaler = StandardScaler()\n",
    "X = scaler.fit_transform(X)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "class FM:\n",
    "    def __init__(self, k=10, alpha=0.01, l1_ratio=0.5, max_iter=100):\n",
    "        self.k = k\n",
    "        self.alpha = alpha\n",
    "        self.l1_ratio = l1_ratio\n",
    "        self.max_iter = max_iter\n",
    "\n",
    "    def fit(self, X, y):\n",
    "        n, m = X.shape\n",
    "\n",
    "        # 初始化模型参数\n",
    "        self.w0 = np.random.normal()\n",
    "        self.w = np.random.normal(size=(m,))\n",
    "        self.V = np.random.normal(size=(m, self.k))\n",
    "\n",
    "        # 计算交叉项\n",
    "        Xv = X.dot(self.V)\n",
    "        XV = X.T.dot(Xv)\n",
    "\n",
    "        # 迭代优化模型参数\n",
    "        for _ in range(self.max_iter):\n",
    "            y_pred = self.predict(X)\n",
    "            loss = y_pred - y\n",
    "\n",
    "            # 更新偏置项\n",
    "            self.w0 -= self.alpha * np.mean(loss)\n",
    "\n",
    "            # 更新线性项\n",
    "            dw = np.zeros_like(self.w)\n",
    "            for j in range(m):\n",
    "                dw[j] = np.mean(loss * X[:, j]) + self.l1_ratio * np.sign(self.w[j])\n",
    "            self.w -= self.alpha * dw\n",
    "\n",
    "            # 更新交叉项\n",
    "            dV = np.zeros_like(self.V)\n",
    "            for j in range(m):\n",
    "                for f in range(self.k):\n",
    "                    dV[j, f] = np.mean(loss * (X[:, j] * (Xv[:, f] - self.V[j, f] * X[:, j]))) + self.l1_ratio * np.sign(self.V[j, f])\n",
    "            self.V -= self.alpha * dV\n",
    "\n",
    "            # 计算新的交叉项\n",
    "            Xv = X.dot(self.V)\n",
    "            XV = X.T.dot(Xv)\n",
    "\n",
    "    def predict(self, X):\n",
    "        n, m = X.shape\n",
    "\n",
    "        # 计算交叉项\n",
    "        Xv = X.dot(self.V)\n",
    "        XV = X.T.dot(Xv)\n",
    "\n",
    "        # 计算预测值\n",
    "        y_pred = self.w0 + X.dot(self.w) + np.sum(Xv.dot(XV - Xv.T), axis=1) / 2\n",
    "\n",
    "        return y_pred"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 矩阵化实现\n",
    "def fit(self, X, y):\n",
    "    n, m = X.shape\n",
    "\n",
    "    # 初始化模型参数\n",
    "    self.w0 = np.random.normal()\n",
    "    self.w = np.random.normal(size=(m,))\n",
    "    self.V = np.random.normal(size=(m, self.k))\n",
    "\n",
    "    # 计算交叉项\n",
    "    Xv = X.dot(self.V)\n",
    "    XV = X.T.dot(Xv)\n",
    "\n",
    "    # 迭代优化模型参数\n",
    "    for _ in range(self.max_iter):\n",
    "        y_pred = self.predict(X)\n",
    "        loss = y_pred - y\n",
    "\n",
    "        # 更新偏置项\n",
    "        self.w0 -= self.alpha * np.mean(loss)\n",
    "\n",
    "        # 更新线性项\n",
    "        dw = np.mean(np.outer(loss, X), axis=0) + self.l1_ratio * np.sign(self.w)\n",
    "        self.w -= self.alpha * dw\n",
    "\n",
    "        # 更新交叉项\n",
    "        dV = (X[:, :, None] * (Xv[:, None, :] - X[:, :, None] * self.V[None, :, :])).mean(axis=0) * 2 + self.l1_ratio * np.sign(self.V)\n",
    "        self.V -= self.alpha * dV\n",
    "\n",
    "        # 计算新的交叉项\n",
    "        Xv = X.dot(self.V)\n",
    "        XV = X.T.dot(Xv)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 5 基于torch实现"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 5.1 torch实现FM"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "\n",
    "class TorchFM(torch.nn.Module):\n",
    "    def __init__(self, n=None, k=None):\n",
    "        super().__init__()\n",
    "        self.V = torch.nn.Parameter(torch.randn(n, k), requires_grad=True)\n",
    "        self.linear = torch.nn.Linear(n, 1)\n",
    "\n",
    "    def forward(self, x):\n",
    "        out_1 = torch.matmul(x, self.V).pow(2).sum(1, keepdim=True)  #S_1^2\n",
    "        out_2 = torch.matmul(x.pow(2), self.V.pow(2)).sum(1, keepdim=True)  # S_2\n",
    "\n",
    "        out_inter = 0.5 * (out_1 - out_2)\n",
    "        out_linear = self.linear(x)\n",
    "        out = out_inter + out_linear\n",
    "\n",
    "        return out"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "vv = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "outputs": [
    {
     "data": {
      "text/plain": "tensor([[ 1,  4,  9],\n        [16, 25, 36],\n        [49, 64, 81]])"
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vv.pow(2)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 5.2 利用深度学习框架的Embedding层构建稀疏特征的隐向量"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 5.3 稀疏+稠密特征\n",
    "\n",
    "- 实际应用中稠密特征往往不需要构建隐向量$V$。而稀疏特征的隐向量矩阵等价于稀疏特征的Embedding层参数。\n",
    "\n",
    "参考: https://arxiv.org/pdf/1803.05170.pdf\n",
    "https://github.com/shenweichen/DeepCTR/issues/271\n",
    "\n",
    "https://mp.weixin.qq.com/s?__biz=MjM5MzY4NzE3MA==&mid=2247484660&idx=1&sn=41b0fed96beb8c50858ed0e2dd21acbc&source=41#wechat_redirect\n",
    "\n",
    "\n",
    "> In computer vision or natural language understanding, the input\n",
    "data are usually images or textual signals, which are known to be\n",
    "spatially and/or temporally correlated, so DNNs can be applied\n",
    "directly on the raw feature with dense structures. However, in\n",
    "web-scale recommender systems, the input features are sparse, of\n",
    "huge dimension, and present no clear spatial or temporal correlation. Therefore, multi-field categorical form is widely used by\n",
    "related works [9, 31, 37, 40, 46].\n",
    "\n",
    "- 实际应用中稠密向量通常会加入一个全连接层，增加模型的表达能力。\n",
    "在实际实现中，为了加强模型的表达能力，通常会在输入稠密特征时添加一个全连接层（Dense Layer），来对其进行一些非线性变换。这个 Dense Layer 可以帮助模型更好地学习输入稠密特征之间的相互作用，从而提高模型的泛化性能。\n",
    "这已经有Wide&Deep的味道了。比较起来，DeepFM包括FM层和DNN两层，高阶和低阶交互特征concat后进行反向传播（作者说能够使效果更好）。\n",
    "![](images/wide_and_deep_artifact.png)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "outputs": [],
   "source": [
    "import torch.nn as nn\n",
    "\n",
    "\n",
    "class FM(nn.Module):\n",
    "    \"\"\"\n",
    "    FM模型\n",
    "    \"\"\"\n",
    "    def __init__(self, num_sparse_features, num_dense_features, embedding_dim, num_hidden_units=256):\n",
    "        super(FM, self).__init__()\n",
    "\n",
    "        # 初始化稀疏特征的嵌入层\n",
    "        self.sparse_embedding = nn.ModuleList([\n",
    "            nn.Embedding(num_embeddings, embedding_dim) for num_embeddings in num_sparse_features\n",
    "        ])\n",
    "\n",
    "        # 初始化稠密特征的全连接层\n",
    "        self.dense_layer = nn.Sequential(\n",
    "            nn.Linear(num_dense_features, num_hidden_units),\n",
    "            nn.BatchNorm1d(num_hidden_units),\n",
    "            nn.ReLU(),\n",
    "            nn.Linear(num_hidden_units, embedding_dim)\n",
    "        )\n",
    "\n",
    "        # 初始化FM交叉层\n",
    "        self.linear = nn.Linear(embedding_dim * (len(num_sparse_features) + 1), 1)\n",
    "        self.interaction = nn.Linear(embedding_dim, embedding_dim)\n",
    "        self.out = nn.Sigmoid()\n",
    "\n",
    "    # noinspection PyShadowingNames\n",
    "    def forward(self, x_sparse, x_dense):\n",
    "        # 处理稀疏特征\n",
    "        sparse_embeds = [emb(x_sparse[:, i]) for i, emb in enumerate(self.sparse_embedding)]\n",
    "        sparse_embeds = torch.cat(sparse_embeds, dim=1)\n",
    "\n",
    "        # 处理稠密特征\n",
    "        dense_embeds = self.dense_layer(x_dense)\n",
    "\n",
    "        # 计算FM交叉项\n",
    "        x = torch.cat([sparse_embeds, dense_embeds], dim=1)\n",
    "        linear_part = self.linear(x)\n",
    "        square_of_sum = torch.pow(torch.sum(x, dim=1, keepdim=True), 2)\n",
    "        sum_of_square = torch.sum(x * x, dim=1, keepdim=True)\n",
    "        cross_part = square_of_sum - sum_of_square\n",
    "        output = linear_part + 0.5 * cross_part\n",
    "\n",
    "        return self.out(output)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "42\n",
      "Epoch 0, batch 0, loss 0.7551\n",
      "Epoch 1, batch 0, loss 0.7549\n",
      "Epoch 2, batch 0, loss 0.7534\n",
      "Epoch 3, batch 0, loss 0.7136\n",
      "Epoch 4, batch 0, loss 0.6956\n",
      "Epoch 5, batch 0, loss 0.6826\n",
      "Epoch 6, batch 0, loss 0.6796\n",
      "Epoch 7, batch 0, loss 0.6792\n",
      "Epoch 8, batch 0, loss 0.6792\n",
      "Epoch 9, batch 0, loss 0.6790\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "from torch.utils.data import DataLoader, TensorDataset\n",
    "\n",
    "_sparse_numpy = data[['user_id', 'city_code', 'sex', 'item_type']].to_numpy(dtype=np.int)\n",
    "sparse_features = torch.from_numpy(_sparse_numpy)\n",
    "dense_features = torch.from_numpy(data[['age', 'price']].to_numpy(dtype=np.float32))\n",
    "\n",
    "# 假设这些数据集对应的标签是0或1\n",
    "labels = torch.from_numpy(data['click'].to_numpy(dtype=np.float))\n",
    "\n",
    "# 将数据集和标签包装成PyTorch的Dataset\n",
    "dataset = TensorDataset(sparse_features, dense_features, labels)\n",
    "\n",
    "# 创建DataLoader，用于批量加载数据\n",
    "dataloader = DataLoader(dataset, batch_size=32, shuffle=True)\n",
    "\n",
    "# 初始化FM模型\n",
    "fm = FM(num_sparse_features=_sparse_numpy.sum(axis=0) + 1, num_dense_features=dense_features.shape[1], embedding_dim=10)\n",
    "# 定义优化器和损失函数\n",
    "optimizer = torch.optim.Adam(fm.parameters())\n",
    "criterion = nn.BCEWithLogitsLoss()\n",
    "\n",
    "# 训练模型\n",
    "for epoch in range(10):\n",
    "    for i, (x_sparse, x_dense, y) in enumerate(dataloader):\n",
    "        optimizer.zero_grad()\n",
    "        y_pred = fm(x_sparse, x_dense)\n",
    "        loss = criterion(y_pred.view(-1), y.float())\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "        if i % 100 == 0:\n",
    "            print(f\"Epoch {epoch}, batch {i}, loss {loss.item():.4f}\")"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "prediction: tensor([[0.],\n",
      "        [0.],\n",
      "        [1.],\n",
      "        [1.],\n",
      "        [0.],\n",
      "        [1.],\n",
      "        [0.],\n",
      "        [0.],\n",
      "        [0.],\n",
      "        [0.]])\n",
      "probability: tensor([[1.1186e-01],\n",
      "        [1.8225e-09],\n",
      "        [9.9998e-01],\n",
      "        [1.0000e+00],\n",
      "        [4.9495e-10],\n",
      "        [1.0000e+00],\n",
      "        [5.1721e-09],\n",
      "        [5.8817e-09],\n",
      "        [1.5257e-09],\n",
      "        [4.6316e-06]])\n"
     ]
    }
   ],
   "source": [
    "y_prob = fm(sparse_features, dense_features)\n",
    "y_pred = (y_prob >= 0.5).to(torch.float32)\n",
    "print('prediction:', y_pred.data)\n",
    "print('probability:', y_prob.data)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}